Skip to content

WIP: PERF: Cythonize fillna #42309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Jun 29, 2021

Code is still pretty messy at this point, opening mainly for CI purposes. Also, this is my first time messing around with BlockManager so would appreciate a little extra scrutiny on that part.

Benchmarks

```
   before           after         ratio
 [9f6a91ae]       [77c05634]
 <master>         <perf-fillna>
  •  1.16±0.3ms         772±70μs     0.67  replace.FillNa.time_fillna(True)
    
  •  6.78±0.7ms       4.48±0.3ms     0.66  replace.FillNa.time_fillna_df(False)
    
  •  8.86±0.8ms       2.96±0.2ms     0.33  replace.FillNa.time_fillna_limit(False)
    
  •  7.92±0.6ms       2.46±0.6ms     0.31  replace.FillNa.time_fillna_df(True)
    
  •  6.15±0.5ms         810±60μs     0.13  replace.FillNa.time_fillna_limit(True)
    

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
before after ratio
[9f6a91a] [77c0563]

  •    644±10μs      1.15±0.04ms     1.79  replace.FillNa.time_replace(True)
    
  • 4.14±0.08ms       5.43±0.3ms     1.31  replace.FillNa.time_replace(False)
    
  •  4.10±0.2ms       2.30±0.2ms     0.56  replace.FillNa.time_fillna(False)
    
  • 1.24±0.09ms         682±20μs     0.55  replace.FillNa.time_fillna(True)
    
  •  6.98±0.4ms       2.61±0.4ms     0.37  replace.FillNa.time_fillna_limit(False)
    
  •  6.61±0.3ms       1.57±0.1ms     0.24  replace.FillNa.time_fillna_df(True)
    
  •  4.99±0.1ms         622±10μs     0.12  replace.FillNa.time_fillna_limit(True)
    

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

   before           after         ratio
 [9f6a91ae]       [77c05634]
 <master>         <perf-fillna>
  •  1.14±0.3ms         612±40μs     0.54  replace.FillNa.time_replace(True)
    
  •    7.22±2ms       1.61±0.1ms     0.22  replace.FillNa.time_fillna_df(True)
    
  •    8.97±2ms       2.00±0.2ms     0.22  replace.FillNa.time_fillna_limit(False)
    
  •  5.47±0.2ms         639±30μs     0.12  replace.FillNa.time_fillna_limit(True)
    

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

</Details>

# TODO: This seems to work for EAS, verify it does
return [
self.make_block_same_class(
values=self.values.fillna(value=value, limit=limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably need to make NDArrayBackedExtensionArray.fillna use the new cython function(s)

# TODO: Verify EA case
if is_extension_array_dtype(value):
mask = value.isna()
value = np.asarray(value[mask], dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going straight to object may be unnecessarily heavy for e.g. Categorical value.

mask = value.isna()
value = np.asarray(value[mask], dtype=object)
libalgos.fillna1d_multi_values(
arr.values[mask], value=value, limit=limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to fill arr.values[mask] inplace, but i dont think arr.values[mask] shares data with arr.values?

@jbrockmendel
Copy link
Member

My main advice is to not worry about the Block stuff for now, just get the cython code working and write tests for it directly with ndarrays (and NDArrayBackedExtensionArray potentiall)

@lithomas1 lithomas1 added Performance Memory or execution speed performance Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jul 2, 2021
@lithomas1
Copy link
Member Author

@jbrockmendel Is it OK to do the NDArrayBacked.fillna changes in a followup, since this PR is already getting somewhat large?

@jbrockmendel
Copy link
Member

Good idea, yes.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 9, 2021
@mroeschke
Copy link
Member

Appears this PR has been dormant for a while closing to clear the queue. Feel free to reopen when you have time to work on this PR.

@mroeschke mroeschke closed this Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF/QST: Why isn't fillna cythonized?
3 participants